Cost-Saving Effect of Crowdsourcing Learning

نویسندگان

  • Lu Wang
  • Zhi-Hua Zhou
چکیده

Crowdsourcing is widely adopted in many domains as a popular paradigm to outsource work to individuals. In the machine learning community, crowdsourcing is commonly used as a cost-saving way to collect labels for training data. While a lot of effort has been spent on developing methods for inferring labels from a crowd, few work concentrates on the theoretical foundation of crowdsourcing learning. In this paper, we theoretically study the cost-saving effect of crowdsourcing learning, and present an upper bound for the minimally-sufficient number of crowd labels for effective crowdsourcing learning. Our results provide an understanding about how to allocate crowd labels efficiently, and are verified empirically.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Perform Three Data Mining Tasks with Crowdsourcing Process

For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...

متن کامل

Winning by Learning? An Exploratory Study of Knowledge Sharing and Usage on Crowdsourcing Platforms

A crowdsourcing platform enables solution seekers to use contributions from a group of online users, usually by organizing crowdsourcing contests through which online users contribute to the solution generation process. Knowledge sharing on a crowdsourcing platform could play an important role in the process of crowdsourcing contests and contestants generating high-quality solutions. On one han...

متن کامل

Active Learning with Amazon Mechanical Turk

Supervised classification needs large amounts of annotated training data that is expensive to create. Two approaches that reduce the cost of annotation are active learning and crowdsourcing. However, these two approaches have not been combined successfully to date. We evaluate the utility of active learning in crowdsourcing on two tasks, named entity recognition and sentiment detection, and sho...

متن کامل

Learning and Feature Selection under Budget Constraints in Crowdsourcing

The cost of data acquisition limits the amount of labeled data available for machine learning algorithms, both at the training and the testing phase. This problem is further exacerbated in real-world crowdsourcing applications where labels are aggregated from multiple noisy answers. We tackle classification problems where the underlying feature labels are unknown to the algorithm and a (noisy) ...

متن کامل

Mentor: A Visualization and Quality Assurance Framework for Crowd-Sourced Data Generation

Crowdsourcing is a feasible method for collecting labeled datasets for training and evaluating machine learning models. Compared to the expensive process of generating labeled datasets using dedicated trained judges, the low cost of data generation in crowdsourcing environments enables researchers and practitioners to collect significantly larger amounts of data for the same cost. However, crow...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016